pacman::p_load(olsrr, GWmodel, corrplot, ggpubr, sf, spdep, tidyverse, tmap, gtsummary, broom.helpers,
ggstatsplot, performance, sfdep, see)In-class Exercise 7: Geographically Weighted Regression (GWR)
1 Overview: Calibrating Hedonic Pricing Model for Private Highrise Properties with GWR Method
Geographically weighted regression (GWR) is a spatial statistical technique that takes non-stationary variables into consideration (e.g., climate; demographic factors; physical environment characteristics) and models the local relationships between these independent variables and as an outcome of interest (also known as dependent variable). In this hands-on exercise, we will learn how to build hedonic pricing models by using GWR methods. The dependent variable is the resale prices of condominium in 2015. The independent variables are divided into either structural and/or locational.
2 The Data
Two data sets will be used in this model building exercise, they are:
- URA Master Plan subzone boundary in shapefile format (i.e. MP14_SUBZONE_WEB_PL) AND
- condo_resale_2015 in csv format (i.e. condo_resale_2015.csv)
3 Getting Started
Before getting started, it is important to install the necessary R packages into R and launch these R packages into the R environment.
The R packages needed for this exercise are as follows:
- R package for building Ordinary Least Squares regression (OLS) and performing diagnostics tests
- R package for calibrating geographical weighted family of models
- R package for multivariate data visualisation and analysis
- Spatial data handling
- sf
- Attribute data handling
- tidyverse, especially readr, ggplot2 and dplyr
- Choropleth mapping
- tmap
- Presentation-Ready Data Summary and Analytic Result Tables
- gtsummary
- Provide utilities for computing indices of model quality and goodness of fit
- Publication-ready visualizations for model parameters, predictions, and performance diagnostics.
The code chunk below installs and launches these R packages into R environment.
4 Short note about GWmodel
GWmodel package provides a collection of localised spatial statistical methods, namely: GW summary statistics, GW principal components analysis, GW discriminant analysis and various forms of GW regression; some of which are provided in basic and robust (outlier resistant) forms. More commonly, outputs or parameters of the GWmodel are mapped to provide a useful exploratory tool, which can often precede (and direct) a more traditional or sophisticated statistical analysis.
5 Importing the data
5.1 Importing geospatial data
The geospatial data used in this hands-on exercise is called MP14_SUBZONE_WEB_PL. It is in ESRI shapefile format. The shapefile consists of URA Master Plan 2014’s planning subzone boundaries. Polygon features are used to represent these geographic boundaries. The GIS data is in svy21 projected coordinates systems.
The code chunk below is used to import MP_SUBZONE_WEB_PL shapefile by using st_read() of sf packages. The code chunk below also updates the newly imported mpsz sf object with the correct ESPG code (i.e. 3414)
mpsz <- st_read(dsn = "data/geospatial",
layer = "MP14_SUBZONE_WEB_PL") %>%
st_transform(3414)Reading layer `MP14_SUBZONE_WEB_PL' from data source
`C:\zjho008\ISSS626-GAA\In-class_Ex\In-class_Ex07\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
The result above shows that the R object used to contain the imported MP14_SUBZONE_WEB_PL shapefile is called mpsz and it is a simple feature object. The geometry type is MULTIPOLYGON. it is also important to note that the mpsz simple feature object does not have EPSG information.
After transforming the object, verification of the projection on the newly transformed mpsz_svy21 is done by using st_crs() of sf package.
The code chunk below is used to verify the newly transformed mpsz_svy21.
st_crs(mpsz)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
Notice that the EPSG: is indicated as 3414 now.
ID[“EPSG”,3414]]
Next, the extent of mpsz_svy21 is revealed by using st_bbox() of sf package.
st_bbox(mpsz) xmin ymin xmax ymax
2667.538 15748.721 56396.440 50256.334
The extent of mpsz_svy21 is illustrated from the results above.
5.2 URA Master Plan 2014 planning subzone boundary
The condo_resale_2015 is in csv file format. The codes chunk below uses read_csv() function of readr package to import condo_resale_2015 into R as a tibble data frame called condo_resale.
condo_resale <- read_csv("data/aspatial/Condo_resale_2015.csv")Rows: 1436 Columns: 23
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (23): LATITUDE, LONGITUDE, POSTCODE, SELLING_PRICE, AREA_SQM, AGE, PROX_...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
After importing the aspatial data file into R, it is important to examine if the data file has been imported correctly.
The codes chunks below uses glimpse() and head() to display the data structure.
glimpse(condo_resale)Rows: 1,436
Columns: 23
$ LATITUDE <dbl> 1.287145, 1.328698, 1.313727, 1.308563, 1.321437,…
$ LONGITUDE <dbl> 103.7802, 103.8123, 103.7971, 103.8247, 103.9505,…
$ POSTCODE <dbl> 118635, 288420, 267833, 258380, 467169, 466472, 3…
$ SELLING_PRICE <dbl> 3000000, 3880000, 3325000, 4250000, 1400000, 1320…
$ AREA_SQM <dbl> 309, 290, 248, 127, 145, 139, 218, 141, 165, 168,…
$ AGE <dbl> 30, 32, 33, 7, 28, 22, 24, 24, 27, 31, 17, 22, 6,…
$ PROX_CBD <dbl> 7.941259, 6.609797, 6.898000, 4.038861, 11.783402…
$ PROX_CHILDCARE <dbl> 0.16597932, 0.28027246, 0.42922669, 0.39473543, 0…
$ PROX_ELDERLYCARE <dbl> 2.5198118, 1.9333338, 0.5021395, 1.9910316, 1.121…
$ PROX_URA_GROWTH_AREA <dbl> 6.618741, 7.505109, 6.463887, 4.906512, 6.410632,…
$ PROX_HAWKER_MARKET <dbl> 1.76542207, 0.54507614, 0.37789301, 1.68259969, 0…
$ PROX_KINDERGARTEN <dbl> 0.05835552, 0.61592412, 0.14120309, 0.38200076, 0…
$ PROX_MRT <dbl> 0.5607188, 0.6584461, 0.3053433, 0.6910183, 0.528…
$ PROX_PARK <dbl> 1.1710446, 0.1992269, 0.2779886, 0.9832843, 0.116…
$ PROX_PRIMARY_SCH <dbl> 1.6340256, 0.9747834, 1.4715016, 1.4546324, 0.709…
$ PROX_TOP_PRIMARY_SCH <dbl> 3.3273195, 0.9747834, 1.4715016, 2.3006394, 0.709…
$ PROX_SHOPPING_MALL <dbl> 2.2102717, 2.9374279, 1.2256850, 0.3525671, 1.307…
$ PROX_SUPERMARKET <dbl> 0.9103958, 0.5900617, 0.4135583, 0.4162219, 0.581…
$ PROX_BUS_STOP <dbl> 0.10336166, 0.28673408, 0.28504777, 0.29872340, 0…
$ NO_Of_UNITS <dbl> 18, 20, 27, 30, 30, 31, 32, 32, 32, 32, 34, 34, 3…
$ FAMILY_FRIENDLY <dbl> 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0…
$ FREEHOLD <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1…
$ LEASEHOLD_99YR <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
head(condo_resale$LONGITUDE) # to see the data in XCOORD column[1] 103.7802 103.8123 103.7971 103.8247 103.9505 103.9386
head(condo_resale$LATITUDE) # to see the data in YCOORD column[1] 1.287145 1.328698 1.313727 1.308563 1.321437 1.314198
Following which, summary() of base R is used to display the summary statistics of condo_resale tibble data frame.
summary(condo_resale) LATITUDE LONGITUDE POSTCODE SELLING_PRICE
Min. :1.240 Min. :103.7 Min. : 18965 Min. : 540000
1st Qu.:1.309 1st Qu.:103.8 1st Qu.:259849 1st Qu.: 1100000
Median :1.328 Median :103.8 Median :469298 Median : 1383222
Mean :1.334 Mean :103.8 Mean :440439 Mean : 1751211
3rd Qu.:1.357 3rd Qu.:103.9 3rd Qu.:589486 3rd Qu.: 1950000
Max. :1.454 Max. :104.0 Max. :828833 Max. :18000000
AREA_SQM AGE PROX_CBD PROX_CHILDCARE
Min. : 34.0 Min. : 0.00 Min. : 0.3869 Min. :0.004927
1st Qu.:103.0 1st Qu.: 5.00 1st Qu.: 5.5574 1st Qu.:0.174481
Median :121.0 Median :11.00 Median : 9.3567 Median :0.258135
Mean :136.5 Mean :12.14 Mean : 9.3254 Mean :0.326313
3rd Qu.:156.0 3rd Qu.:18.00 3rd Qu.:12.6661 3rd Qu.:0.368293
Max. :619.0 Max. :37.00 Max. :19.1804 Max. :3.465726
PROX_ELDERLYCARE PROX_URA_GROWTH_AREA PROX_HAWKER_MARKET PROX_KINDERGARTEN
Min. :0.05451 Min. :0.2145 Min. :0.05182 Min. :0.004927
1st Qu.:0.61254 1st Qu.:3.1643 1st Qu.:0.55245 1st Qu.:0.276345
Median :0.94179 Median :4.6186 Median :0.90842 Median :0.413385
Mean :1.05351 Mean :4.5981 Mean :1.27987 Mean :0.458903
3rd Qu.:1.35122 3rd Qu.:5.7550 3rd Qu.:1.68578 3rd Qu.:0.578474
Max. :3.94916 Max. :9.1554 Max. :5.37435 Max. :2.229045
PROX_MRT PROX_PARK PROX_PRIMARY_SCH PROX_TOP_PRIMARY_SCH
Min. :0.05278 Min. :0.02906 Min. :0.07711 Min. :0.07711
1st Qu.:0.34646 1st Qu.:0.26211 1st Qu.:0.44024 1st Qu.:1.34451
Median :0.57430 Median :0.39926 Median :0.63505 Median :1.88213
Mean :0.67316 Mean :0.49802 Mean :0.75471 Mean :2.27347
3rd Qu.:0.84844 3rd Qu.:0.65592 3rd Qu.:0.95104 3rd Qu.:2.90954
Max. :3.48037 Max. :2.16105 Max. :3.92899 Max. :6.74819
PROX_SHOPPING_MALL PROX_SUPERMARKET PROX_BUS_STOP NO_Of_UNITS
Min. :0.0000 Min. :0.0000 Min. :0.001595 Min. : 18.0
1st Qu.:0.5258 1st Qu.:0.3695 1st Qu.:0.098356 1st Qu.: 188.8
Median :0.9357 Median :0.5687 Median :0.151710 Median : 360.0
Mean :1.0455 Mean :0.6141 Mean :0.193974 Mean : 409.2
3rd Qu.:1.3994 3rd Qu.:0.7862 3rd Qu.:0.220466 3rd Qu.: 590.0
Max. :3.4774 Max. :2.2441 Max. :2.476639 Max. :1703.0
FAMILY_FRIENDLY FREEHOLD LEASEHOLD_99YR
Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Median :0.0000 Median :0.0000 Median :0.0000
Mean :0.4868 Mean :0.4227 Mean :0.4882
3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :1.0000 Max. :1.0000 Max. :1.0000
5.3 Converting aspatial data frame into a sf object
The condo_resale tibble data frame is an aspatial data. We will convert it to a sf object. The code chunk below converts condo_resale data frame into a simple feature data frame by using st_as_sf() of sf packages.
condo_resale_sf <- st_as_sf(condo_resale, # to convert condo resale data into simple feature - since it consists of latitude and longitude; note the PRJ format file which gives the Projects Coordinates System
coords = c("LONGITUDE", "LATITUDE"),
crs = 4326) %>% # this CRS will be in WGS84 "orignal data source"
st_transform(crs = 3414) # to project into svy21 - the projected CRS of Singapore whereby the code is 3414
condo_resale_sf # Condo resale sf data frameSimple feature collection with 1436 features and 21 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 14940.85 ymin: 24765.67 xmax: 43352.45 ymax: 48382.81
Projected CRS: SVY21 / Singapore TM
# A tibble: 1,436 × 22
POSTCODE SELLING_PRICE AREA_SQM AGE PROX_CBD PROX_CHILDCARE
* <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 118635 3000000 309 30 7.94 0.166
2 288420 3880000 290 32 6.61 0.280
3 267833 3325000 248 33 6.90 0.429
4 258380 4250000 127 7 4.04 0.395
5 467169 1400000 145 28 11.8 0.119
6 466472 1320000 139 22 10.3 0.125
7 309502 3410000 218 24 4.24 0.326
8 468497 1420000 141 24 11.6 0.162
9 118450 2025000 165 27 6.46 0.123
10 268157 2550000 168 31 6.52 0.609
# ℹ 1,426 more rows
# ℹ 16 more variables: PROX_ELDERLYCARE <dbl>, PROX_URA_GROWTH_AREA <dbl>,
# PROX_HAWKER_MARKET <dbl>, PROX_KINDERGARTEN <dbl>, PROX_MRT <dbl>,
# PROX_PARK <dbl>, PROX_PRIMARY_SCH <dbl>, PROX_TOP_PRIMARY_SCH <dbl>,
# PROX_SHOPPING_MALL <dbl>, PROX_SUPERMARKET <dbl>, PROX_BUS_STOP <dbl>,
# NO_Of_UNITS <dbl>, FAMILY_FRIENDLY <dbl>, FREEHOLD <dbl>,
# LEASEHOLD_99YR <dbl>, geometry <POINT [m]>
Notice that st_transform() of sf package is used to convert the coordinates from wgs84 (i.e. crs:4326) to svy21 (i.e. crs=3414).
Next, head() is used to list the contents of condo_resale.sf object.
head(condo_resale_sf)Simple feature collection with 6 features and 21 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 22085.12 ymin: 29951.54 xmax: 41042.56 ymax: 34546.2
Projected CRS: SVY21 / Singapore TM
# A tibble: 6 × 22
POSTCODE SELLING_PRICE AREA_SQM AGE PROX_CBD PROX_CHILDCARE PROX_ELDERLYCARE
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 118635 3000000 309 30 7.94 0.166 2.52
2 288420 3880000 290 32 6.61 0.280 1.93
3 267833 3325000 248 33 6.90 0.429 0.502
4 258380 4250000 127 7 4.04 0.395 1.99
5 467169 1400000 145 28 11.8 0.119 1.12
6 466472 1320000 139 22 10.3 0.125 0.789
# ℹ 15 more variables: PROX_URA_GROWTH_AREA <dbl>, PROX_HAWKER_MARKET <dbl>,
# PROX_KINDERGARTEN <dbl>, PROX_MRT <dbl>, PROX_PARK <dbl>,
# PROX_PRIMARY_SCH <dbl>, PROX_TOP_PRIMARY_SCH <dbl>,
# PROX_SHOPPING_MALL <dbl>, PROX_SUPERMARKET <dbl>, PROX_BUS_STOP <dbl>,
# NO_Of_UNITS <dbl>, FAMILY_FRIENDLY <dbl>, FREEHOLD <dbl>,
# LEASEHOLD_99YR <dbl>, geometry <POINT [m]>
Notice that the output is in a point feature data frame.
Geometry type: POINT
condo_resale_sf <- write_rds(condo_resale_sf,
"data/rds/condo_resale_sf.rds")condo_resale_sf <- read_rds(
"data/rds/condo_resale_sf.rds")6 Correlation Analysis - ggstatsplot methods
6.0.1 Visualising the relationships of the independent variables
Before building a multiple regression model, it is important to ensure that the independent variables used are not highly correlated to each other. If highly correlated independent variables are used in building a regression model, the quality of the model will be compromised. This phenomenon is known as multicollinearity in statistics.
Correlation matrix is commonly used to visualise the relationships between the independent variables. Besides the pairs() of R, there are many packages supporting the display of a correlation matrix. In this section, the corrplot package will be used.
The code chunk below is used to plot a scatter plot matrix of the relationship between the independent variables in condo_resale data.frame.
corrplot(cor(condo_resale[, 5:23]), diag = FALSE, order = "AOE",
tl.pos = "td", tl.cex = 0.5, method = "number", type = "upper")
A matrix reorder is very important for mining the hidden structure and patterns in the matrix. There are four methods in corrplot (parameter order), named: “AOE”, “FPC”, “hclust”, “alphabet”.
In the code chunk above, AOE order is used. It orders the variables by using the angular order of the eigenvectors method suggested by Michael Friendly.
From the scatterplot matrix, it is clear that Freehold is highly correlated to LEASE_99YEAR. In view of this, it gives reason to include only either one of them in the subsequent model building.
In this case, LEASE_99YEAR is excluded in the subsequent model building.
In the code chunk below, instead of using corrplot package ggcorrmat() of ggstatsplot is used.
ggcorrmat(condo_resale[, 5:23])
Similarly, it is observed that LEASEHOLD_99YR and FREEHOLD is highly correlated.
7 Building a hedonic pricing model using multiple linear regression method
The code chunk below uses lm() to calibrate the multiple linear regression model.
condo_mlr <- lm(formula = SELLING_PRICE ~ AREA_SQM + AGE +
PROX_CBD + PROX_CHILDCARE + PROX_ELDERLYCARE +
PROX_URA_GROWTH_AREA + PROX_HAWKER_MARKET + PROX_KINDERGARTEN +
PROX_MRT + PROX_PARK + PROX_PRIMARY_SCH +
PROX_TOP_PRIMARY_SCH + PROX_SHOPPING_MALL + PROX_SUPERMARKET +
PROX_BUS_STOP + NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD + LEASEHOLD_99YR,
data = condo_resale_sf)
summary(condo_mlr)
Call:
lm(formula = SELLING_PRICE ~ AREA_SQM + AGE + PROX_CBD + PROX_CHILDCARE +
PROX_ELDERLYCARE + PROX_URA_GROWTH_AREA + PROX_HAWKER_MARKET +
PROX_KINDERGARTEN + PROX_MRT + PROX_PARK + PROX_PRIMARY_SCH +
PROX_TOP_PRIMARY_SCH + PROX_SHOPPING_MALL + PROX_SUPERMARKET +
PROX_BUS_STOP + NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD +
LEASEHOLD_99YR, data = condo_resale_sf)
Residuals:
Min 1Q Median 3Q Max
-3471036 -286903 -22426 239412 12254549
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 543071.4 136210.9 3.987 7.03e-05 ***
AREA_SQM 12688.7 370.1 34.283 < 2e-16 ***
AGE -24566.0 2766.0 -8.881 < 2e-16 ***
PROX_CBD -78122.0 6791.4 -11.503 < 2e-16 ***
PROX_CHILDCARE -333219.0 111020.3 -3.001 0.002734 **
PROX_ELDERLYCARE 170950.0 42110.8 4.060 5.19e-05 ***
PROX_URA_GROWTH_AREA 38507.6 12523.7 3.075 0.002147 **
PROX_HAWKER_MARKET 23801.2 29299.9 0.812 0.416739
PROX_KINDERGARTEN 144098.0 82738.7 1.742 0.081795 .
PROX_MRT -322775.9 58528.1 -5.515 4.14e-08 ***
PROX_PARK 564487.9 66563.0 8.481 < 2e-16 ***
PROX_PRIMARY_SCH 186170.5 65515.2 2.842 0.004553 **
PROX_TOP_PRIMARY_SCH -477.1 20598.0 -0.023 0.981525
PROX_SHOPPING_MALL -207721.5 42855.5 -4.847 1.39e-06 ***
PROX_SUPERMARKET -48074.7 77145.3 -0.623 0.533273
PROX_BUS_STOP 675755.0 138552.0 4.877 1.20e-06 ***
NO_Of_UNITS -216.2 90.3 -2.394 0.016797 *
FAMILY_FRIENDLY 142128.3 47055.1 3.020 0.002569 **
FREEHOLD 300646.5 77296.5 3.890 0.000105 ***
LEASEHOLD_99YR -77137.4 77570.9 -0.994 0.320192
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 755800 on 1416 degrees of freedom
Multiple R-squared: 0.652, Adjusted R-squared: 0.6474
F-statistic: 139.6 on 19 and 1416 DF, p-value: < 2.2e-16
8 Model Assessment: olsrr method
In this section, we introduce an excellent R package designed specifically for conducting Ordinary Least Squares (OLS) regression: olsrr. This package offers a comprehensive set of tools to enhance the development of multiple linear regression models. Key features include:
- Detailed regression output
- Diagnostic tools for residual analysis
- Influence measures
- Tests for heteroskedasticity
- Model fit evaluation
- Assessment of variable contributions
- Procedures for variable selection
These functionalities make olsrr a powerful resource for building and refining regression models in R.
8.1 Generating tidy linear regression report
ols_regress(condo_mlr) # global model Model Summary
-----------------------------------------------------------------------------
R 0.807 RMSE 750537.537
R-Squared 0.652 MSE 571262902261.223
Adj. R-Squared 0.647 Coef. Var 43.160
Pred R-Squared 0.637 AIC 42971.173
MAE 412117.987 SBC 43081.835
-----------------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
--------------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------------------
Regression 1.515738e+15 19 7.977571e+13 139.648 0.0000
Residual 8.089083e+14 1416 571262902261.223
Total 2.324647e+15 1435
--------------------------------------------------------------------------------
Parameter Estimates
-----------------------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
-----------------------------------------------------------------------------------------------------------------
(Intercept) 543071.420 136210.918 3.987 0.000 275874.535 810268.305
AREA_SQM 12688.669 370.119 0.579 34.283 0.000 11962.627 13414.710
AGE -24566.001 2766.041 -0.166 -8.881 0.000 -29991.980 -19140.022
PROX_CBD -78121.985 6791.377 -0.267 -11.503 0.000 -91444.227 -64799.744
PROX_CHILDCARE -333219.036 111020.303 -0.087 -3.001 0.003 -551000.984 -115437.089
PROX_ELDERLYCARE 170949.961 42110.748 0.083 4.060 0.000 88343.803 253556.120
PROX_URA_GROWTH_AREA 38507.622 12523.661 0.059 3.075 0.002 13940.700 63074.545
PROX_HAWKER_MARKET 23801.197 29299.923 0.019 0.812 0.417 -33674.725 81277.120
PROX_KINDERGARTEN 144097.972 82738.669 0.030 1.742 0.082 -18205.570 306401.514
PROX_MRT -322775.874 58528.079 -0.123 -5.515 0.000 -437586.937 -207964.811
PROX_PARK 564487.876 66563.011 0.148 8.481 0.000 433915.162 695060.590
PROX_PRIMARY_SCH 186170.524 65515.193 0.072 2.842 0.005 57653.253 314687.795
PROX_TOP_PRIMARY_SCH -477.073 20597.972 -0.001 -0.023 0.982 -40882.894 39928.747
PROX_SHOPPING_MALL -207721.520 42855.500 -0.109 -4.847 0.000 -291788.613 -123654.427
PROX_SUPERMARKET -48074.679 77145.257 -0.012 -0.623 0.533 -199405.956 103256.599
PROX_BUS_STOP 675755.044 138551.991 0.133 4.877 0.000 403965.817 947544.272
NO_Of_UNITS -216.180 90.302 -0.046 -2.394 0.017 -393.320 -39.040
FAMILY_FRIENDLY 142128.272 47055.082 0.056 3.020 0.003 49823.107 234433.438
FREEHOLD 300646.543 77296.529 0.117 3.890 0.000 149018.525 452274.561
LEASEHOLD_99YR -77137.375 77570.869 -0.030 -0.994 0.320 -229303.551 75028.801
-----------------------------------------------------------------------------------------------------------------
Using the ols_regress() function it generates an improved table for our condo_mlr results. We can reject null hypothesis as the p-value is smaller than our alpha value of 0.05. Based on the Adjusted R-Squared value, this multiple linear regression model is able to explain 64.7% of the price variation.
For PROX_TOP_PRIMARY_SCH & PROX_SUPERMARKET they are not statistically significant with p-values above 0.05. Which indicates that they can be eliminated from building the model later on.
8.2 Multicollinearity
Variance Inflation Factors (VIF) is calculated in this section after the model is calibrated. Steps done: - Refer to ANOVA table to reject null hypothesis - Adjusted r-square Values - Before going to the parameters
ols_vif_tol(condo_mlr) Variables Tolerance VIF
1 AREA_SQM 0.8601326 1.162611
2 AGE 0.7011585 1.426211
3 PROX_CBD 0.4575471 2.185567
4 PROX_CHILDCARE 0.2898233 3.450378
5 PROX_ELDERLYCARE 0.5922238 1.688551
6 PROX_URA_GROWTH_AREA 0.6614081 1.511926
7 PROX_HAWKER_MARKET 0.4373874 2.286303
8 PROX_KINDERGARTEN 0.8356793 1.196631
9 PROX_MRT 0.4949877 2.020252
10 PROX_PARK 0.8015728 1.247547
11 PROX_PRIMARY_SCH 0.3823248 2.615577
12 PROX_TOP_PRIMARY_SCH 0.4878620 2.049760
13 PROX_SHOPPING_MALL 0.4903052 2.039546
14 PROX_SUPERMARKET 0.6142127 1.628100
15 PROX_BUS_STOP 0.3311024 3.020213
16 NO_Of_UNITS 0.6543336 1.528272
17 FAMILY_FRIENDLY 0.7191719 1.390488
18 FREEHOLD 0.2728521 3.664990
19 LEASEHOLD_99YR 0.2645988 3.779307
Based on the results of the Variance Inflation Factors (VIF) none of the variables are greater than 5. Each of the independent variables are calculated with another independent variable to attain the values above. This shows no need to eliminate the variables.
- 0 to 5: variables are not correlated
- 5 to 10: variables are correlated
- Greater than 10: variables are highly correlated
note that there are binary variables like Y/N options (dummy variables) which have some signs of correlation which are from the variable of lease properties: LEASEHOLD_99YR vs FREEHOLD etc.
8.3 Variable Selection
Stepwise Regression is being used
Forward Stepwise: All independent variables are outside and the variables are loaded in the model - once variable is added in the R Sq and Adjusted R sq is calculated and checking the criteria (E.g. Confidence Levels - values above 0.05 are rejected. The variables have to be below 0.05 and has to improve the R Squared value )
Backward Stepwise: Variables are all loaded inside and they are taken out one by one based on how the adjusted R Square decreases and cafeterias such as the P- Value.
No Replacement once they variables are rejected or added in for an iteration they cannot be placed back in the model
Mixed Stepwise - Using the method of forward stepwise but with replacement.
The functions are already built in with the olsrr package.
condo_fw_mlr <- ols_step_forward_p( # Assessment criteria using p-value
condo_mlr,
p_val = 0.05,
details = TRUE) # With details = true it will show all the iterations and the steps + entire report. details = FALSE will not show the individual split but only showing the Forward Selection Method
------------------------
Candidate Terms:
1. AREA_SQM
2. AGE
3. PROX_CBD
4. PROX_CHILDCARE
5. PROX_ELDERLYCARE
6. PROX_URA_GROWTH_AREA
7. PROX_HAWKER_MARKET
8. PROX_KINDERGARTEN
9. PROX_MRT
10. PROX_PARK
11. PROX_PRIMARY_SCH
12. PROX_TOP_PRIMARY_SCH
13. PROX_SHOPPING_MALL
14. PROX_SUPERMARKET
15. PROX_BUS_STOP
16. NO_Of_UNITS
17. FAMILY_FRIENDLY
18. FREEHOLD
19. LEASEHOLD_99YR
Step => 0
Model => SELLING_PRICE ~ 1
R2 => 0
Initiating stepwise selection...
Selection Metrics Table
----------------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------------------
AREA_SQM 0.00000 0.452 0.451 43587.753
PROX_CBD 0.00000 0.243 0.242 44051.772
FREEHOLD 0.00000 0.082 0.081 44328.539
LEASEHOLD_99YR 0.00000 0.066 0.065 44353.172
PROX_PARK 0.00000 0.049 0.048 44378.817
NO_Of_UNITS 0.00000 0.048 0.048 44380.124
PROX_PRIMARY_SCH 0.00000 0.032 0.032 44403.847
PROX_HAWKER_MARKET 0.00000 0.023 0.022 44417.505
PROX_CHILDCARE 0.00000 0.021 0.021 44420.298
PROX_ELDERLYCARE 0.00000 0.021 0.020 44420.546
PROX_BUS_STOP 0.00000 0.021 0.020 44420.742
PROX_KINDERGARTEN 2e-05 0.013 0.012 44432.322
PROX_SUPERMARKET 0.00088 0.008 0.007 44439.977
PROX_SHOPPING_MALL 0.00154 0.007 0.006 44441.023
FAMILY_FRIENDLY 0.00907 0.005 0.004 44444.248
PROX_MRT 0.01071 0.005 0.004 44444.545
PROX_URA_GROWTH_AREA 0.13510 0.002 0.001 44448.832
PROX_TOP_PRIMARY_SCH 0.23180 0.001 0.000 44449.636
AGE 0.52978 0.000 0.000 44450.673
----------------------------------------------------------------------------
Step => 1
Selected => AREA_SQM
Model => SELLING_PRICE ~ AREA_SQM
R2 => 0.452
Selection Metrics Table
----------------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------------------
PROX_CBD 0.00000 0.569 0.569 43243.523
FREEHOLD 0.00000 0.487 0.487 43493.627
PROX_PARK 0.00000 0.478 0.478 43518.542
LEASEHOLD_99YR 0.00000 0.475 0.474 43527.150
AGE 0.00000 0.471 0.470 43538.063
PROX_SHOPPING_MALL 0.00000 0.467 0.466 43549.216
PROX_HAWKER_MARKET 0.00000 0.465 0.464 43555.065
PROX_MRT 0.00000 0.465 0.464 43556.097
NO_Of_UNITS 0.00000 0.464 0.463 43557.089
PROX_SUPERMARKET 0.00000 0.461 0.461 43564.792
PROX_PRIMARY_SCH 3e-05 0.458 0.458 43572.418
PROX_ELDERLYCARE 5e-05 0.458 0.457 43573.203
PROX_URA_GROWTH_AREA 9e-05 0.458 0.457 43574.292
FAMILY_FRIENDLY 0.00026 0.457 0.456 43576.392
PROX_CHILDCARE 0.00275 0.455 0.455 43580.768
PROX_BUS_STOP 0.00381 0.455 0.454 43581.362
PROX_KINDERGARTEN 0.15757 0.453 0.452 43587.751
PROX_TOP_PRIMARY_SCH 0.47485 0.452 0.451 43589.241
----------------------------------------------------------------------------
Step => 2
Selected => PROX_CBD
Model => SELLING_PRICE ~ AREA_SQM + PROX_CBD
R2 => 0.569
Selection Metrics Table
----------------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------------------
PROX_PARK 0.00000 0.589 0.588 43177.691
AGE 0.00000 0.586 0.585 43188.935
FREEHOLD 0.00000 0.579 0.578 43213.005
PROX_ELDERLYCARE 0.00000 0.578 0.577 43216.850
PROX_TOP_PRIMARY_SCH 0.00000 0.577 0.576 43218.861
LEASEHOLD_99YR 0.00000 0.576 0.575 43224.500
PROX_HAWKER_MARKET 1e-05 0.575 0.574 43225.123
PROX_SHOPPING_MALL 8e-05 0.574 0.573 43229.948
PROX_SUPERMARKET 0.00147 0.572 0.571 43235.376
PROX_MRT 0.00613 0.572 0.571 43237.989
NO_Of_UNITS 0.01059 0.571 0.570 43238.970
PROX_PRIMARY_SCH 0.04530 0.570 0.570 43241.503
PROX_BUS_STOP 0.06634 0.570 0.569 43242.142
FAMILY_FRIENDLY 0.11212 0.570 0.569 43242.991
PROX_CHILDCARE 0.29768 0.570 0.569 43244.435
PROX_URA_GROWTH_AREA 0.78658 0.569 0.568 43245.450
PROX_KINDERGARTEN 0.80879 0.569 0.568 43245.465
----------------------------------------------------------------------------
Step => 3
Selected => PROX_PARK
Model => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK
R2 => 0.589
Selection Metrics Table
----------------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------------------
FREEHOLD 0.00000 0.604 0.603 43125.474
AGE 0.00000 0.602 0.601 43132.534
LEASEHOLD_99YR 0.00000 0.601 0.600 43138.902
PROX_ELDERLYCARE 0.00000 0.596 0.595 43153.932
PROX_TOP_PRIMARY_SCH 3e-05 0.594 0.593 43162.363
NO_Of_UNITS 0.00013 0.593 0.592 43164.977
PROX_SHOPPING_MALL 0.00015 0.593 0.592 43165.286
PROX_HAWKER_MARKET 7e-04 0.592 0.591 43168.151
PROX_MRT 0.00250 0.592 0.591 43170.516
FAMILY_FRIENDLY 0.02445 0.591 0.589 43174.609
PROX_SUPERMARKET 0.02905 0.591 0.589 43174.908
PROX_URA_GROWTH_AREA 0.14518 0.590 0.589 43177.560
PROX_CHILDCARE 0.31093 0.589 0.588 43178.660
PROX_PRIMARY_SCH 0.34515 0.589 0.588 43178.796
PROX_BUS_STOP 0.47898 0.589 0.588 43179.188
PROX_KINDERGARTEN 0.87351 0.589 0.588 43179.665
----------------------------------------------------------------------------
Step => 4
Selected => FREEHOLD
Model => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD
R2 => 0.604
Selection Metrics Table
----------------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------------------
AGE 0.00000 0.620 0.619 43069.222
PROX_SHOPPING_MALL 0.00000 0.611 0.609 43104.195
PROX_ELDERLYCARE 5e-05 0.609 0.608 43111.036
PROX_TOP_PRIMARY_SCH 7e-05 0.609 0.607 43111.551
PROX_HAWKER_MARKET 0.00088 0.607 0.606 43116.360
PROX_SUPERMARKET 0.00324 0.607 0.605 43118.765
PROX_MRT 0.00345 0.607 0.605 43118.882
PROX_BUS_STOP 0.09204 0.605 0.604 43124.623
FAMILY_FRIENDLY 0.11599 0.605 0.604 43124.992
PROX_PRIMARY_SCH 0.21752 0.605 0.603 43125.946
NO_Of_UNITS 0.25242 0.605 0.603 43126.158
PROX_URA_GROWTH_AREA 0.27640 0.605 0.603 43126.284
LEASEHOLD_99YR 0.49846 0.605 0.603 43127.014
PROX_KINDERGARTEN 0.66364 0.604 0.603 43127.284
PROX_CHILDCARE 0.82289 0.604 0.603 43127.424
----------------------------------------------------------------------------
Step => 5
Selected => AGE
Model => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE
R2 => 0.62
Selection Metrics Table
----------------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------------------
PROX_ELDERLYCARE 0.00000 0.627 0.625 43046.515
PROX_SHOPPING_MALL 0.00014 0.624 0.622 43056.710
PROX_TOP_PRIMARY_SCH 0.00036 0.623 0.622 43058.400
PROX_MRT 0.00118 0.623 0.621 43060.651
PROX_HAWKER_MARKET 0.00229 0.623 0.621 43061.874
NO_Of_UNITS 0.03614 0.621 0.620 43066.808
PROX_SUPERMARKET 0.03902 0.621 0.620 43066.940
PROX_PRIMARY_SCH 0.04454 0.621 0.620 43067.165
PROX_URA_GROWTH_AREA 0.05538 0.621 0.619 43067.532
FAMILY_FRIENDLY 0.06368 0.621 0.619 43067.765
PROX_BUS_STOP 0.09258 0.621 0.619 43068.378
LEASEHOLD_99YR 0.33191 0.620 0.619 43070.276
PROX_KINDERGARTEN 0.54422 0.620 0.619 43070.852
PROX_CHILDCARE 0.76117 0.620 0.619 43071.129
----------------------------------------------------------------------------
Step => 6
Selected => PROX_ELDERLYCARE
Model => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE
R2 => 0.627
Selection Metrics Table
----------------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------------------
PROX_SHOPPING_MALL 0.00000 0.634 0.632 43020.990
PROX_MRT 0.00000 0.633 0.631 43024.733
PROX_SUPERMARKET 0.00311 0.629 0.627 43039.719
PROX_CHILDCARE 0.00320 0.629 0.627 43039.776
PROX_TOP_PRIMARY_SCH 0.02859 0.628 0.626 43043.694
FAMILY_FRIENDLY 0.04001 0.628 0.626 43044.273
PROX_URA_GROWTH_AREA 0.06111 0.628 0.626 43044.987
PROX_HAWKER_MARKET 0.14370 0.627 0.625 43046.364
NO_Of_UNITS 0.21750 0.627 0.625 43046.985
LEASEHOLD_99YR 0.33225 0.627 0.625 43047.569
PROX_PRIMARY_SCH 0.72554 0.627 0.625 43048.391
PROX_BUS_STOP 0.73834 0.627 0.625 43048.403
PROX_KINDERGARTEN 0.96832 0.627 0.625 43048.513
----------------------------------------------------------------------------
Step => 7
Selected => PROX_SHOPPING_MALL
Model => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE + PROX_SHOPPING_MALL
R2 => 0.634
Selection Metrics Table
----------------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------------------
PROX_URA_GROWTH_AREA 2e-04 0.637 0.635 43009.092
PROX_MRT 0.00038 0.637 0.635 43010.278
FAMILY_FRIENDLY 0.09004 0.634 0.632 43020.098
NO_Of_UNITS 0.09561 0.634 0.632 43020.195
PROX_BUS_STOP 0.10105 0.634 0.632 43020.284
PROX_CHILDCARE 0.16782 0.634 0.632 43021.075
PROX_PRIMARY_SCH 0.20169 0.634 0.632 43021.349
PROX_HAWKER_MARKET 0.28053 0.634 0.632 43021.818
PROX_SUPERMARKET 0.39017 0.634 0.632 43022.247
LEASEHOLD_99YR 0.41342 0.634 0.632 43022.317
PROX_KINDERGARTEN 0.64794 0.634 0.632 43022.781
PROX_TOP_PRIMARY_SCH 0.88928 0.634 0.632 43022.971
----------------------------------------------------------------------------
Step => 8
Selected => PROX_URA_GROWTH_AREA
Model => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE + PROX_SHOPPING_MALL + PROX_URA_GROWTH_AREA
R2 => 0.637
Selection Metrics Table
----------------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------------------
PROX_MRT 0.00055 0.640 0.638 42999.058
NO_Of_UNITS 0.04357 0.638 0.636 43006.989
PROX_BUS_STOP 0.07301 0.638 0.636 43007.854
FAMILY_FRIENDLY 0.07751 0.638 0.636 43007.953
PROX_CHILDCARE 0.17683 0.638 0.635 43009.255
LEASEHOLD_99YR 0.26341 0.638 0.635 43009.832
PROX_SUPERMARKET 0.32522 0.637 0.635 43010.117
PROX_TOP_PRIMARY_SCH 0.36995 0.637 0.635 43010.282
PROX_HAWKER_MARKET 0.48716 0.637 0.635 43010.606
PROX_KINDERGARTEN 0.49501 0.637 0.635 43010.623
PROX_PRIMARY_SCH 0.60814 0.637 0.635 43010.827
----------------------------------------------------------------------------
Step => 9
Selected => PROX_MRT
Model => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE + PROX_SHOPPING_MALL + PROX_URA_GROWTH_AREA + PROX_MRT
R2 => 0.64
Selection Metrics Table
----------------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------------------
PROX_BUS_STOP 6e-05 0.644 0.642 42984.951
PROX_PRIMARY_SCH 0.01738 0.642 0.639 42995.355
NO_Of_UNITS 0.04105 0.641 0.639 42996.851
FAMILY_FRIENDLY 0.06468 0.641 0.639 42997.618
PROX_TOP_PRIMARY_SCH 0.16342 0.641 0.638 42999.100
LEASEHOLD_99YR 0.16895 0.641 0.638 42999.151
PROX_KINDERGARTEN 0.19107 0.641 0.638 42999.335
PROX_HAWKER_MARKET 0.19288 0.641 0.638 42999.349
PROX_SUPERMARKET 0.45603 0.640 0.638 43000.498
PROX_CHILDCARE 0.71809 0.640 0.638 43000.927
----------------------------------------------------------------------------
Step => 10
Selected => PROX_BUS_STOP
Model => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE + PROX_SHOPPING_MALL + PROX_URA_GROWTH_AREA + PROX_MRT + PROX_BUS_STOP
R2 => 0.644
Selection Metrics Table
----------------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------------------
FAMILY_FRIENDLY 0.01590 0.646 0.643 42981.085
PROX_CHILDCARE 0.02032 0.646 0.643 42981.519
NO_Of_UNITS 0.03658 0.645 0.643 42982.543
PROX_PRIMARY_SCH 0.06688 0.645 0.642 42983.563
PROX_KINDERGARTEN 0.09160 0.645 0.642 42984.080
LEASEHOLD_99YR 0.10015 0.645 0.642 42984.224
PROX_TOP_PRIMARY_SCH 0.27924 0.645 0.642 42985.770
PROX_HAWKER_MARKET 0.53937 0.644 0.642 42986.571
PROX_SUPERMARKET 0.91393 0.644 0.641 42986.939
----------------------------------------------------------------------------
Step => 11
Selected => FAMILY_FRIENDLY
Model => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE + PROX_SHOPPING_MALL + PROX_URA_GROWTH_AREA + PROX_MRT + PROX_BUS_STOP + FAMILY_FRIENDLY
R2 => 0.646
Selection Metrics Table
----------------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------------------
NO_Of_UNITS 0.00533 0.648 0.645 42975.246
PROX_CHILDCARE 0.01908 0.647 0.644 42977.539
PROX_PRIMARY_SCH 0.06018 0.647 0.644 42979.519
LEASEHOLD_99YR 0.06704 0.647 0.644 42979.699
PROX_KINDERGARTEN 0.09772 0.646 0.643 42980.317
PROX_TOP_PRIMARY_SCH 0.31070 0.646 0.643 42982.048
PROX_HAWKER_MARKET 0.66885 0.646 0.643 42982.901
PROX_SUPERMARKET 0.92593 0.646 0.643 42983.077
----------------------------------------------------------------------------
Step => 12
Selected => NO_Of_UNITS
Model => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE + PROX_SHOPPING_MALL + PROX_URA_GROWTH_AREA + PROX_MRT + PROX_BUS_STOP + FAMILY_FRIENDLY + NO_Of_UNITS
R2 => 0.648
Selection Metrics Table
----------------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------------------
PROX_CHILDCARE 0.02092 0.649 0.646 42971.858
PROX_PRIMARY_SCH 0.05496 0.649 0.645 42973.525
PROX_KINDERGARTEN 0.13311 0.648 0.645 42974.967
LEASEHOLD_99YR 0.16053 0.648 0.645 42975.257
PROX_TOP_PRIMARY_SCH 0.28337 0.648 0.645 42976.084
PROX_HAWKER_MARKET 0.62348 0.648 0.644 42977.003
PROX_SUPERMARKET 0.65604 0.648 0.644 42977.046
----------------------------------------------------------------------------
Step => 13
Selected => PROX_CHILDCARE
Model => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE + PROX_SHOPPING_MALL + PROX_URA_GROWTH_AREA + PROX_MRT + PROX_BUS_STOP + FAMILY_FRIENDLY + NO_Of_UNITS + PROX_CHILDCARE
R2 => 0.649
Selection Metrics Table
----------------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------------------
PROX_PRIMARY_SCH 0.00805 0.651 0.647 42966.758
PROX_KINDERGARTEN 0.08599 0.650 0.646 42970.878
PROX_TOP_PRIMARY_SCH 0.23060 0.649 0.646 42972.405
LEASEHOLD_99YR 0.32104 0.649 0.646 42972.863
PROX_HAWKER_MARKET 0.49652 0.649 0.646 42973.391
PROX_SUPERMARKET 0.59607 0.649 0.646 42973.574
----------------------------------------------------------------------------
Step => 14
Selected => PROX_PRIMARY_SCH
Model => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE + PROX_SHOPPING_MALL + PROX_URA_GROWTH_AREA + PROX_MRT + PROX_BUS_STOP + FAMILY_FRIENDLY + NO_Of_UNITS + PROX_CHILDCARE + PROX_PRIMARY_SCH
R2 => 0.651
Selection Metrics Table
----------------------------------------------------------------------------
Predictor Pr(>|t|) R-Squared Adj. R-Squared AIC
----------------------------------------------------------------------------
PROX_KINDERGARTEN 0.07528 0.651 0.648 42965.558
LEASEHOLD_99YR 0.24093 0.651 0.647 42967.367
PROX_HAWKER_MARKET 0.29790 0.651 0.647 42967.662
PROX_TOP_PRIMARY_SCH 0.38435 0.651 0.647 42967.993
PROX_SUPERMARKET 0.76578 0.651 0.647 42968.669
----------------------------------------------------------------------------
No more variables to be added.
Variables Selected:
=> AREA_SQM
=> PROX_CBD
=> PROX_PARK
=> FREEHOLD
=> AGE
=> PROX_ELDERLYCARE
=> PROX_SHOPPING_MALL
=> PROX_URA_GROWTH_AREA
=> PROX_MRT
=> PROX_BUS_STOP
=> FAMILY_FRIENDLY
=> NO_Of_UNITS
=> PROX_CHILDCARE
=> PROX_PRIMARY_SCH
Using the p-value the statistically significant factors are kept.
Under the list created - there is a list of 3 included metrics, model, others in the condo_fw_mlr list
plot(condo_fw_mlr)
8.4 Visualising model parameters
ggcoefstats(condo_mlr,
sort = "ascending")Number of labels is greater than default palette color count.
• Select another color `palette` (and/or `package`).

8.5 Test for Non-Linearity
In multiple linear regression, it is important for us to test the assumption that linearity and additivity of the relationship between dependent and independent variables.
In the code chunk below, the ols_plot_resid_fit() of olsrr package is used to perform linearity assumption test.
ols_plot_resid_fit(condo_fw_mlr$model)
The figure above reveals that most of the data points are scattered around the 0 line, hence we can safely conclude that the relationships between the dependent and independent variables are linear.
8.6 Tests for Normality Assumption
In the code chunk below, ols_plot_resid_hist() of olsrr package is used to perform normality assumption test.
ols_plot_resid_hist(condo_fw_mlr$model)
The figure above reveals that the residual of the multiple linear regression model (i.e. condo.mlr1) resembles a normal distribution.
For formal statistical test methods, the ols_test_normality() of olsrr package can be used as shown in the code chunk below.
ols_test_normality(condo_fw_mlr$model)Warning in ks.test.default(y, "pnorm", mean(y), sd(y)): ties should not be
present for the one-sample Kolmogorov-Smirnov test
-----------------------------------------------
Test Statistic pvalue
-----------------------------------------------
Shapiro-Wilk 0.6856 0.0000
Kolmogorov-Smirnov 0.1366 0.0000
Cramer-von Mises 121.0768 0.0000
Anderson-Darling 67.9551 0.0000
-----------------------------------------------
The summary table reveals that the p-values of the four tests are way smaller than the alpha value of 0.05. Hence we will reject the null hypothesis and infer that there is statistical evidence that the residuals are not normally distributed.
8.7 Testing for spatial autocorrelation
The hedonic model to be built will utilise geographically referenced attributes, hence it is also important for us to visualise the residual of the hedonic pricing model.
First, we will export the residual of the hedonic pricing model and save it as a data frame.
mlr_output <- as.data.frame(condo_fw_mlr$model$residuals) %>%
rename(`FW_MLR_RES` = `condo_fw_mlr$model$residuals`) # renamed to shorten the field nameNext, we will join the newly created data frame with condo_resale_sf object.
condo_resale_sf <- cbind(condo_resale_sf, # cbind to combine the newly created table condo_resale_sf - is a point data hence using cbind function to append since there is no common identifier
mlr_output$FW_MLR_RES) %>%
rename(`MLR_RES` = `mlr_output.FW_MLR_RES`)Next, we will use tmap package to display the distribution of the residuals on an interactive map.
The code chunk below turns on the interactive mode of tmap.
tmap_mode("view")tmap mode set to interactive viewing
tm_shape(mpsz) +
tmap_options(check.and.fix = TRUE) + # line is used to resolve the issue: polygon issue and geometric error - line written here since the `mpsz` layer is giving the issues. Otherwise it can be done at the start to eliminate all problems.
tm_polygons(alpha = 0.4) + # error due to a HDB flat polygon left in the dataset
tm_shape(condo_resale_sf) +
tm_dots(col = "MLR_RES",
alpha = 0.6,
style = "quantile")Warning: The shape mpsz is invalid (after reprojection). See sf::st_is_valid
Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
tmap_mode("plot") # used to switch the mode back to plottmap mode set to plotting
The plot above reveals that there is signs of spatial autocorrelation.
8.8 Spatial stationary test
To validate our observation, we will conduct the Moran’s I test.
- Null hypothesis (Ho): The residuals are randomly distributed (i.e., spatially stationary).
- Alternative hypothesis (H1): The residuals are not randomly distributed and are spatially non-stationary.
As a first step, we will create a distance-based weight matrix using the dnearneigh() function from the spdep package.
actual price vs estimated transacted price is the residual. Darker green shade represents that - estimated price is higher than the actual transacted price.
On the other hand, the lighter colour represents actual transactions that are much lower than the estimated price
Moran’s I test will be performed with the code chunk below.
The latest version of GW model also facilitates the use of sfdep
condo_resale_sf <- condo_resale_sf %>%
mutate(nb = st_knn(geometry, k = 6, # k nearest neighbour
longlat = FALSE), # so that it will not use the grid circle since all the data is already projected - not a longitude,latitude and just use the data as it is.
wt = st_weights(nb,
style = "W"),
.before = 1)Next, global_moran_perm() of sfdep is used to perform global Moran permutation test.
global_moran_perm(condo_resale_sf$MLR_RES, # data from condo_resale_sf and MLR_RES is the column that will be used
condo_resale_sf$nb,
condo_resale_sf$wt,
alternative = "two.sided",
nsim = 99) # 100 permutations
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.32254, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
The Global Moran’s test I for residual spatial autocorrelation shows that it’s p-value is less than 0.00000000000000022 which is less than the alpha value of 0.05. Hence, we will reject the null hypothesis that the residuals are randomly distributed.
Since the Observed Global Moran I = 0.25586 (statistic = 0.32254) which is greater than 0, we can infer that the residuals resemble cluster distribution.
9 Building Hedonic Pricing Models using GWmodel
This section will illustrate how to model hedonic pricing by using a geographically weighted regression model. Two spatial weights are used: - fixed bandwidth scheme - adaptive bandwidth scheme
9.1 Building Fixed bandwidth GWR Model
In the code chunk below bw.gwr() of GWModel package is used to determine the optimal fixed bandwidth to use in the model. Notice that the argument adaptive is set to FALSE indicating that we are interested to compute the fixed bandwidth.
There are two possible approaches can be used to determine the stopping rule, they are: CV cross-validation approach and AIC corrected (AICc) approach. We define the stopping rule using the approach agreement.
bw.fixed <- bw.gwr(formula = SELLING_PRICE ~ AREA_SQM + AGE + PROX_CBD +
PROX_CHILDCARE + PROX_ELDERLYCARE + PROX_URA_GROWTH_AREA +
PROX_MRT + PROX_PARK + PROX_PRIMARY_SCH +
PROX_SHOPPING_MALL + PROX_BUS_STOP + NO_Of_UNITS +
FAMILY_FRIENDLY + FREEHOLD,
data = condo_resale_sf,
approach = "CV", # CV
kernel = "gaussian", # has to be used in later steps for consistency
adaptive = FALSE,
longlat = FALSE) # so that greater distance is not calculatedFixed bandwidth: 17660.96 CV score: 8.259118e+14
Fixed bandwidth: 10917.26 CV score: 7.970454e+14
Fixed bandwidth: 6749.419 CV score: 7.273273e+14
Fixed bandwidth: 4173.553 CV score: 6.300006e+14
Fixed bandwidth: 2581.58 CV score: 5.404958e+14
Fixed bandwidth: 1597.687 CV score: 4.857515e+14
Fixed bandwidth: 989.6077 CV score: 4.722431e+14
Fixed bandwidth: 613.7939 CV score: 1.378294e+16
Fixed bandwidth: 1221.873 CV score: 4.778717e+14
Fixed bandwidth: 846.0596 CV score: 4.791629e+14
Fixed bandwidth: 1078.325 CV score: 4.751406e+14
Fixed bandwidth: 934.7772 CV score: 4.72518e+14
Fixed bandwidth: 1023.495 CV score: 4.730305e+14
Fixed bandwidth: 968.6643 CV score: 4.721317e+14
Fixed bandwidth: 955.7206 CV score: 4.722072e+14
Fixed bandwidth: 976.6639 CV score: 4.721387e+14
Fixed bandwidth: 963.7202 CV score: 4.721484e+14
Fixed bandwidth: 971.7199 CV score: 4.721293e+14
Fixed bandwidth: 973.6083 CV score: 4.721309e+14
Fixed bandwidth: 970.5527 CV score: 4.721295e+14
Fixed bandwidth: 972.4412 CV score: 4.721296e+14
Fixed bandwidth: 971.2741 CV score: 4.721292e+14
Fixed bandwidth: 970.9985 CV score: 4.721293e+14
Fixed bandwidth: 971.4443 CV score: 4.721292e+14
Fixed bandwidth: 971.5496 CV score: 4.721293e+14
Fixed bandwidth: 971.3793 CV score: 4.721292e+14
Fixed bandwidth: 971.3391 CV score: 4.721292e+14
Fixed bandwidth: 971.3143 CV score: 4.721292e+14
Fixed bandwidth: 971.3545 CV score: 4.721292e+14
Fixed bandwidth: 971.3296 CV score: 4.721292e+14
Fixed bandwidth: 971.345 CV score: 4.721292e+14
Fixed bandwidth: 971.3355 CV score: 4.721292e+14
Fixed bandwidth: 971.3413 CV score: 4.721292e+14
Fixed bandwidth: 971.3377 CV score: 4.721292e+14
Fixed bandwidth: 971.34 CV score: 4.721292e+14
Fixed bandwidth: 971.3405 CV score: 4.721292e+14
Fixed bandwidth: 971.3408 CV score: 4.721292e+14
Fixed bandwidth: 971.3403 CV score: 4.721292e+14
Fixed bandwidth: 971.3406 CV score: 4.721292e+14
Fixed bandwidth: 971.3404 CV score: 4.721292e+14
Fixed bandwidth: 971.3405 CV score: 4.721292e+14
Fixed bandwidth: 971.3405 CV score: 4.721292e+14
The bandwidth distances are becoming shorter (in metres)
Some of the results are as shown:
- Fixed bandwidth: 613.7939 CV score: 1.378294e+16
- Fixed bandwidth: 1221.873 CV score: 4.778717e+14
The bandwidth increases at time which is due to the iterations ran
For the values below:
- Fixed bandwidth: 971.3405 CV score: 4.721292e+14
- Fixed bandwidth: 971.3408 CV score: 4.721292e+14
- Fixed bandwidth: 971.3403 CV score: 4.721292e+14
- Fixed bandwidth: 971.3406 CV score: 4.721292e+14
- Fixed bandwidth: 971.3404 CV score: 4.721292e+14
- Fixed bandwidth: 971.3405 CV score: 4.721292e+14
- Fixed bandwidth: 971.3405 CV score: 4.721292e+14
The distances are refined while looking for the best CV score. once the rate of change is to minimal then it will stop running the iterations.
9.1.1 GWModel Method - Fixed Bandwidth
Now to utilise the code chunk below to calibrate the GWR Model using fixed bandwidth and the Gaussian Kernel.
gwr_fixed <- gwr.basic(formula = SELLING_PRICE ~ AREA_SQM + AGE + PROX_CBD +
PROX_CHILDCARE + PROX_ELDERLYCARE + PROX_URA_GROWTH_AREA +
PROX_MRT + PROX_PARK + PROX_PRIMARY_SCH +
PROX_SHOPPING_MALL + PROX_BUS_STOP + NO_Of_UNITS +
FAMILY_FRIENDLY + FREEHOLD,
data = condo_resale_sf,
bw = bw.fixed,
kernel = "gaussian", # has to be used in later steps for consistency
longlat = FALSE) # so that greater distance is not calculatedThe output is saved in a list of class “gwrm”. The code below can be used to display the model output.
The variables are not changed but the spatial components are accounted for in the calculation for this GWR Model.
gwr_fixed ***********************************************************************
* Package GWmodel *
***********************************************************************
Program starts at: 2024-10-17 00:08:29.422253
Call:
gwr.basic(formula = SELLING_PRICE ~ AREA_SQM + AGE + PROX_CBD +
PROX_CHILDCARE + PROX_ELDERLYCARE + PROX_URA_GROWTH_AREA +
PROX_MRT + PROX_PARK + PROX_PRIMARY_SCH + PROX_SHOPPING_MALL +
PROX_BUS_STOP + NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD,
data = condo_resale_sf, bw = bw.fixed, kernel = "gaussian",
longlat = FALSE)
Dependent (y) variable: SELLING_PRICE
Independent variables: AREA_SQM AGE PROX_CBD PROX_CHILDCARE PROX_ELDERLYCARE PROX_URA_GROWTH_AREA PROX_MRT PROX_PARK PROX_PRIMARY_SCH PROX_SHOPPING_MALL PROX_BUS_STOP NO_Of_UNITS FAMILY_FRIENDLY FREEHOLD
Number of data points: 1436
***********************************************************************
* Results of Global Regression *
***********************************************************************
Call:
lm(formula = formula, data = data)
Residuals:
Min 1Q Median 3Q Max
-3470778 -298119 -23481 248917 12234210
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 527633.22 108183.22 4.877 1.20e-06 ***
AREA_SQM 12777.52 367.48 34.771 < 2e-16 ***
AGE -24687.74 2754.84 -8.962 < 2e-16 ***
PROX_CBD -77131.32 5763.12 -13.384 < 2e-16 ***
PROX_CHILDCARE -318472.75 107959.51 -2.950 0.003231 **
PROX_ELDERLYCARE 185575.62 39901.86 4.651 3.61e-06 ***
PROX_URA_GROWTH_AREA 39163.25 11754.83 3.332 0.000885 ***
PROX_MRT -294745.11 56916.37 -5.179 2.56e-07 ***
PROX_PARK 570504.81 65507.03 8.709 < 2e-16 ***
PROX_PRIMARY_SCH 159856.14 60234.60 2.654 0.008046 **
PROX_SHOPPING_MALL -220947.25 36561.83 -6.043 1.93e-09 ***
PROX_BUS_STOP 682482.22 134513.24 5.074 4.42e-07 ***
NO_Of_UNITS -245.48 87.95 -2.791 0.005321 **
FAMILY_FRIENDLY 146307.58 46893.02 3.120 0.001845 **
FREEHOLD 350599.81 48506.48 7.228 7.98e-13 ***
---Significance stars
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 756000 on 1421 degrees of freedom
Multiple R-squared: 0.6507
Adjusted R-squared: 0.6472
F-statistic: 189.1 on 14 and 1421 DF, p-value: < 2.2e-16
***Extra Diagnostic information
Residual sum of squares: 8.120609e+14
Sigma(hat): 752522.9
AIC: 42966.76
AICc: 42967.14
BIC: 41731.39
***********************************************************************
* Results of Geographically Weighted Regression *
***********************************************************************
*********************Model calibration information*********************
Kernel function: gaussian
Fixed bandwidth: 971.3405
Regression points: the same locations as observations are used.
Distance metric: Euclidean distance metric is used.
****************Summary of GWR coefficient estimates:******************
Min. 1st Qu. Median 3rd Qu.
Intercept -3.5988e+07 -5.1998e+05 7.6780e+05 1.7412e+06
AREA_SQM 1.0003e+03 5.2758e+03 7.4740e+03 1.2301e+04
AGE -1.3475e+05 -2.0813e+04 -8.6260e+03 -3.7784e+03
PROX_CBD -7.7047e+07 -2.3608e+05 -8.3600e+04 3.4646e+04
PROX_CHILDCARE -6.0097e+06 -3.3667e+05 -9.7425e+04 2.9007e+05
PROX_ELDERLYCARE -3.5000e+06 -1.5970e+05 3.1971e+04 1.9577e+05
PROX_URA_GROWTH_AREA -3.0170e+06 -8.2013e+04 7.0749e+04 2.2612e+05
PROX_MRT -3.5282e+06 -6.5836e+05 -1.8833e+05 3.6922e+04
PROX_PARK -1.2062e+06 -2.1732e+05 3.5383e+04 4.1335e+05
PROX_PRIMARY_SCH -2.2695e+07 -1.7066e+05 4.8472e+04 5.1555e+05
PROX_SHOPPING_MALL -7.2585e+06 -1.6684e+05 -1.0517e+04 1.5923e+05
PROX_BUS_STOP -1.4676e+06 -4.5207e+04 3.7601e+05 1.1664e+06
NO_Of_UNITS -1.3170e+03 -2.4822e+02 -3.0846e+01 2.5496e+02
FAMILY_FRIENDLY -2.2749e+06 -1.1140e+05 7.6214e+03 1.6107e+05
FREEHOLD -9.2067e+06 3.8073e+04 1.5169e+05 3.7528e+05
Max.
Intercept 112793548
AREA_SQM 21575
AGE 434201
PROX_CBD 2704596
PROX_CHILDCARE 1654087
PROX_ELDERLYCARE 38867814
PROX_URA_GROWTH_AREA 78515730
PROX_MRT 3124316
PROX_PARK 18122425
PROX_PRIMARY_SCH 4637503
PROX_SHOPPING_MALL 1529952
PROX_BUS_STOP 11342182
NO_Of_UNITS 12907
FAMILY_FRIENDLY 1720744
FREEHOLD 6073636
************************Diagnostic information*************************
Number of data points: 1436
Effective number of parameters (2trace(S) - trace(S'S)): 438.3804
Effective degrees of freedom (n-2trace(S) + trace(S'S)): 997.6196
AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 42263.61
AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 41632.36
BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 42515.71
Residual sum of squares: 2.53407e+14
R-square value: 0.8909912
Adjusted R-square value: 0.8430417
***********************************************************************
Program stops at: 2024-10-17 00:08:30.52213
The report shows that the AICc of the gwr is 42263.61 under the Diagnostic Information section which is significantly smaller than the global multiple linear regression model of 42967.1.
9.2 Building Adaptive Bandwidth GWR Model
GWR based hedonic pricing model will be calibrated by using adaptive bandwidth approach.
Similar to the earlier section, we will first use bw.gwr() to determine the recommended data points for usage.
The code chunk used will look very similar to the one used to compute the fixed bandwidth except the adaptive argument has changed to TRUE.
bw.adaptive <- bw.gwr(formula = SELLING_PRICE ~ AREA_SQM + AGE +
PROX_CBD + PROX_CHILDCARE + PROX_ELDERLYCARE +
PROX_URA_GROWTH_AREA + PROX_MRT + PROX_PARK +
PROX_PRIMARY_SCH + PROX_SHOPPING_MALL + PROX_BUS_STOP +
NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD,
data = condo_resale_sf,
approach = "CV",
kernel = "gaussian",
adaptive = TRUE,
longlat = FALSE)Adaptive bandwidth: 895 CV score: 7.952401e+14
Adaptive bandwidth: 561 CV score: 7.667364e+14
Adaptive bandwidth: 354 CV score: 6.953454e+14
Adaptive bandwidth: 226 CV score: 6.15223e+14
Adaptive bandwidth: 147 CV score: 5.674373e+14
Adaptive bandwidth: 98 CV score: 5.426745e+14
Adaptive bandwidth: 68 CV score: 5.168117e+14
Adaptive bandwidth: 49 CV score: 4.859631e+14
Adaptive bandwidth: 37 CV score: 4.646518e+14
Adaptive bandwidth: 30 CV score: 4.422088e+14
Adaptive bandwidth: 25 CV score: 4.430816e+14
Adaptive bandwidth: 32 CV score: 4.505602e+14
Adaptive bandwidth: 27 CV score: 4.462172e+14
Adaptive bandwidth: 30 CV score: 4.422088e+14
30 nearest neighbour is the recommended bandwidth - meaning to use 30 data points to calculate the regression model
Now to calibrate the gwr-based hedonic pricing model by using adaptive bandwidth and gaussian kernel as shown in the code chunk below.
gwr_adaptive <- gwr.basic(formula = SELLING_PRICE ~ AREA_SQM + AGE +
PROX_CBD + PROX_CHILDCARE + PROX_ELDERLYCARE +
PROX_URA_GROWTH_AREA + PROX_MRT + PROX_PARK +
PROX_PRIMARY_SCH + PROX_SHOPPING_MALL + PROX_BUS_STOP +
NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD,
data = condo_resale_sf, bw = bw.adaptive,
kernel = 'gaussian',
adaptive=TRUE,
longlat = FALSE)The code below can be used to display the model output.
gwr_adaptive ***********************************************************************
* Package GWmodel *
***********************************************************************
Program starts at: 2024-10-17 00:08:38.782278
Call:
gwr.basic(formula = SELLING_PRICE ~ AREA_SQM + AGE + PROX_CBD +
PROX_CHILDCARE + PROX_ELDERLYCARE + PROX_URA_GROWTH_AREA +
PROX_MRT + PROX_PARK + PROX_PRIMARY_SCH + PROX_SHOPPING_MALL +
PROX_BUS_STOP + NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD,
data = condo_resale_sf, bw = bw.adaptive, kernel = "gaussian",
adaptive = TRUE, longlat = FALSE)
Dependent (y) variable: SELLING_PRICE
Independent variables: AREA_SQM AGE PROX_CBD PROX_CHILDCARE PROX_ELDERLYCARE PROX_URA_GROWTH_AREA PROX_MRT PROX_PARK PROX_PRIMARY_SCH PROX_SHOPPING_MALL PROX_BUS_STOP NO_Of_UNITS FAMILY_FRIENDLY FREEHOLD
Number of data points: 1436
***********************************************************************
* Results of Global Regression *
***********************************************************************
Call:
lm(formula = formula, data = data)
Residuals:
Min 1Q Median 3Q Max
-3470778 -298119 -23481 248917 12234210
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 527633.22 108183.22 4.877 1.20e-06 ***
AREA_SQM 12777.52 367.48 34.771 < 2e-16 ***
AGE -24687.74 2754.84 -8.962 < 2e-16 ***
PROX_CBD -77131.32 5763.12 -13.384 < 2e-16 ***
PROX_CHILDCARE -318472.75 107959.51 -2.950 0.003231 **
PROX_ELDERLYCARE 185575.62 39901.86 4.651 3.61e-06 ***
PROX_URA_GROWTH_AREA 39163.25 11754.83 3.332 0.000885 ***
PROX_MRT -294745.11 56916.37 -5.179 2.56e-07 ***
PROX_PARK 570504.81 65507.03 8.709 < 2e-16 ***
PROX_PRIMARY_SCH 159856.14 60234.60 2.654 0.008046 **
PROX_SHOPPING_MALL -220947.25 36561.83 -6.043 1.93e-09 ***
PROX_BUS_STOP 682482.22 134513.24 5.074 4.42e-07 ***
NO_Of_UNITS -245.48 87.95 -2.791 0.005321 **
FAMILY_FRIENDLY 146307.58 46893.02 3.120 0.001845 **
FREEHOLD 350599.81 48506.48 7.228 7.98e-13 ***
---Significance stars
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 756000 on 1421 degrees of freedom
Multiple R-squared: 0.6507
Adjusted R-squared: 0.6472
F-statistic: 189.1 on 14 and 1421 DF, p-value: < 2.2e-16
***Extra Diagnostic information
Residual sum of squares: 8.120609e+14
Sigma(hat): 752522.9
AIC: 42966.76
AICc: 42967.14
BIC: 41731.39
***********************************************************************
* Results of Geographically Weighted Regression *
***********************************************************************
*********************Model calibration information*********************
Kernel function: gaussian
Adaptive bandwidth: 30 (number of nearest neighbours)
Regression points: the same locations as observations are used.
Distance metric: Euclidean distance metric is used.
****************Summary of GWR coefficient estimates:******************
Min. 1st Qu. Median 3rd Qu.
Intercept -1.3487e+08 -2.4669e+05 7.7928e+05 1.6194e+06
AREA_SQM 3.3188e+03 5.6285e+03 7.7825e+03 1.2738e+04
AGE -9.6746e+04 -2.9288e+04 -1.4043e+04 -5.6119e+03
PROX_CBD -2.5330e+06 -1.6256e+05 -7.7242e+04 2.6624e+03
PROX_CHILDCARE -1.2790e+06 -2.0175e+05 8.7158e+03 3.7778e+05
PROX_ELDERLYCARE -1.6212e+06 -9.2050e+04 6.1029e+04 2.8184e+05
PROX_URA_GROWTH_AREA -7.2686e+06 -3.0350e+04 4.5869e+04 2.4613e+05
PROX_MRT -4.3781e+07 -6.7282e+05 -2.2115e+05 -7.4593e+04
PROX_PARK -2.9020e+06 -1.6782e+05 1.1601e+05 4.6572e+05
PROX_PRIMARY_SCH -8.6418e+05 -1.6627e+05 -7.7853e+03 4.3222e+05
PROX_SHOPPING_MALL -1.8272e+06 -1.3175e+05 -1.4049e+04 1.3799e+05
PROX_BUS_STOP -2.0579e+06 -7.1461e+04 4.1104e+05 1.2071e+06
NO_Of_UNITS -2.1993e+03 -2.3685e+02 -3.4699e+01 1.1657e+02
FAMILY_FRIENDLY -5.9879e+05 -5.0927e+04 2.6173e+04 2.2481e+05
FREEHOLD -1.6340e+05 4.0765e+04 1.9023e+05 3.7960e+05
Max.
Intercept 18758355
AREA_SQM 23064
AGE 13303
PROX_CBD 11346650
PROX_CHILDCARE 2892127
PROX_ELDERLYCARE 2465671
PROX_URA_GROWTH_AREA 7384059
PROX_MRT 1186242
PROX_PARK 2588497
PROX_PRIMARY_SCH 3381462
PROX_SHOPPING_MALL 38038564
PROX_BUS_STOP 12081592
NO_Of_UNITS 1010
FAMILY_FRIENDLY 2072414
FREEHOLD 1813995
************************Diagnostic information*************************
Number of data points: 1436
Effective number of parameters (2trace(S) - trace(S'S)): 350.3088
Effective degrees of freedom (n-2trace(S) + trace(S'S)): 1085.691
AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 41982.22
AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 41546.74
BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 41914.08
Residual sum of squares: 2.528227e+14
R-square value: 0.8912425
Adjusted R-square value: 0.8561185
***********************************************************************
Program stops at: 2024-10-17 00:08:40.196155
The report shows that the AICc of the adaptive distance gwr is 41982.22 (AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 41982.22) which is even smaller than the AICc of the fixed distance gwr of 42263.61.
9.2.1 Visualisign GWR Output
In addition to regression residuals, the output feature class table includes fields for observed and predicted y values, condition number (cond), Local R2, residuals, and explanatory variable coefficients and standard errors:
Condition Number: This diagnostic assesses local collinearity in the model. When local collinearity is high, the results may become unstable. A condition number greater than 30 suggests that the results may be unreliable.
Local R²: This metric ranges from 0.0 to 1.0 and indicates how well the local regression model fits the observed y values. Low values suggest poor model performance in certain areas. Mapping Local R² can highlight where the Geographically Weighted Regression (GWR) performs well or poorly, offering insights into potentially missing variables.
Predicted Values: These are the estimated y values generated by the GWR model, representing the fitted values.
Residuals: Residuals are calculated by subtracting the predicted y values from the observed y values. Standardized residuals, which have a mean of zero and a standard deviation of 1, can be visualized on a cold-to-hot color scale, indicating areas of under- or over-prediction.
Coefficient Standard Error: This measures the reliability of each coefficient estimate. Smaller standard errors relative to the coefficient values suggest greater confidence in the estimates, while large standard errors may indicate issues with local collinearity.
They are all stored in a SpatialPointsDataFrame or SpatialPolygonsDataFrame object integrated with fit.points, GWR coefficient estimates, y value, predicted values, coefficient standard errors and t-values in its “data” slot in an object called SDF of the output list.
9.2.2 Converting SDF into sf data.frame
To visualise the fields in SDF, we need to first convert it into sf data.frame by using the code chunk below:
gwr_adaptive_output <- as.data.frame(
gwr_adaptive$SDF) %>%
select(-c(2:12)) # exclude column 2 & 15gwr_sf_adaptive <- cbind(condo_resale_sf,
gwr_adaptive_output)Next, glimpse() is used to display the content of condo_resale_sf.adpative sf data frame.
glimpse(gwr_sf_adaptive)Rows: 1,436
Columns: 66
$ nb <nb> <66, 77, 123, 238, 239, 343>, <21, 162, 163, 19…
$ wt <list> <0.1666667, 0.1666667, 0.1666667, 0.1666667, …
$ POSTCODE <dbl> 118635, 288420, 267833, 258380, 467169, 466472…
$ SELLING_PRICE <dbl> 3000000, 3880000, 3325000, 4250000, 1400000, 1…
$ AREA_SQM <dbl> 309, 290, 248, 127, 145, 139, 218, 141, 165, 1…
$ AGE <dbl> 30, 32, 33, 7, 28, 22, 24, 24, 27, 31, 17, 22,…
$ PROX_CBD <dbl> 7.941259, 6.609797, 6.898000, 4.038861, 11.783…
$ PROX_CHILDCARE <dbl> 0.16597932, 0.28027246, 0.42922669, 0.39473543…
$ PROX_ELDERLYCARE <dbl> 2.5198118, 1.9333338, 0.5021395, 1.9910316, 1.…
$ PROX_URA_GROWTH_AREA <dbl> 6.618741, 7.505109, 6.463887, 4.906512, 6.4106…
$ PROX_HAWKER_MARKET <dbl> 1.76542207, 0.54507614, 0.37789301, 1.68259969…
$ PROX_KINDERGARTEN <dbl> 0.05835552, 0.61592412, 0.14120309, 0.38200076…
$ PROX_MRT <dbl> 0.5607188, 0.6584461, 0.3053433, 0.6910183, 0.…
$ PROX_PARK <dbl> 1.1710446, 0.1992269, 0.2779886, 0.9832843, 0.…
$ PROX_PRIMARY_SCH <dbl> 1.6340256, 0.9747834, 1.4715016, 1.4546324, 0.…
$ PROX_TOP_PRIMARY_SCH <dbl> 3.3273195, 0.9747834, 1.4715016, 2.3006394, 0.…
$ PROX_SHOPPING_MALL <dbl> 2.2102717, 2.9374279, 1.2256850, 0.3525671, 1.…
$ PROX_SUPERMARKET <dbl> 0.9103958, 0.5900617, 0.4135583, 0.4162219, 0.…
$ PROX_BUS_STOP <dbl> 0.10336166, 0.28673408, 0.28504777, 0.29872340…
$ NO_Of_UNITS <dbl> 18, 20, 27, 30, 30, 31, 32, 32, 32, 32, 34, 34…
$ FAMILY_FRIENDLY <dbl> 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0…
$ FREEHOLD <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1…
$ LEASEHOLD_99YR <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ MLR_RES <dbl> -1489099.55, 415494.57, 194129.69, 1088992.71,…
$ Intercept <dbl> 2050011.67, 1633128.24, 3433608.17, 234358.91,…
$ NO_Of_UNITS.1 <dbl> 104.8290640, -288.3441183, -9.5532945, -161.35…
$ FAMILY_FRIENDLY.1 <dbl> -9075.370, 310074.664, 5949.746, 1556178.531, …
$ FREEHOLD.1 <dbl> 303955.61, 396221.27, 168821.75, 1212515.58, 3…
$ y <dbl> 3000000, 3880000, 3325000, 4250000, 1400000, 1…
$ yhat <dbl> 2886531.8, 3466801.5, 3616527.2, 5435481.6, 13…
$ residual <dbl> 113468.16, 413198.52, -291527.20, -1185481.63,…
$ CV_Score <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ Stud_residual <dbl> 0.38207013, 1.01433140, -0.83780678, -2.846146…
$ Intercept_SE <dbl> 516105.5, 488083.5, 963711.4, 444185.5, 211962…
$ AREA_SQM_SE <dbl> 823.2860, 825.2380, 988.2240, 617.4007, 1376.2…
$ AGE_SE <dbl> 5889.782, 6226.916, 6510.236, 6010.511, 8180.3…
$ PROX_CBD_SE <dbl> 37411.22, 23615.06, 56103.77, 469337.41, 41064…
$ PROX_CHILDCARE_SE <dbl> 319111.1, 299705.3, 349128.5, 304965.2, 698720…
$ PROX_ELDERLYCARE_SE <dbl> 120633.34, 84546.69, 129687.07, 127150.69, 327…
$ PROX_URA_GROWTH_AREA_SE <dbl> 56207.39, 76956.50, 95774.60, 470762.12, 47433…
$ PROX_MRT_SE <dbl> 185181.3, 281133.9, 275483.7, 279877.1, 363830…
$ PROX_PARK_SE <dbl> 205499.6, 229358.7, 314124.3, 227249.4, 364580…
$ PROX_PRIMARY_SCH_SE <dbl> 152400.7, 165150.7, 196662.6, 240878.9, 249087…
$ PROX_SHOPPING_MALL_SE <dbl> 109268.8, 98906.8, 119913.3, 177104.1, 301032.…
$ PROX_BUS_STOP_SE <dbl> 600668.6, 410222.1, 464156.7, 562810.8, 740922…
$ NO_Of_UNITS_SE <dbl> 218.1258, 208.9410, 210.9828, 361.7767, 299.50…
$ FAMILY_FRIENDLY_SE <dbl> 131474.73, 114989.07, 146607.22, 108726.62, 16…
$ FREEHOLD_SE <dbl> 115954.0, 130110.0, 141031.5, 138239.1, 210641…
$ Intercept_TV <dbl> 3.9720784, 3.3460017, 3.5629010, 0.5276150, 1.…
$ AREA_SQM_TV <dbl> 11.614302, 20.087361, 13.247868, 33.577223, 4.…
$ AGE_TV <dbl> -1.6154474, -9.3441881, -4.1023685, -15.524301…
$ PROX_CBD_TV <dbl> -3.22582173, -6.32792021, -4.62353528, 5.17080…
$ PROX_CHILDCARE_TV <dbl> 1.000488185, 1.471786337, -0.344047555, 1.5766…
$ PROX_ELDERLYCARE_TV <dbl> -3.26126929, 3.84626245, 4.13191383, 2.4756745…
$ PROX_URA_GROWTH_AREA_TV <dbl> -2.846248368, -1.848971738, -2.648105057, -5.6…
$ PROX_MRT_TV <dbl> -1.61864578, -8.92998600, -3.40075727, -7.2870…
$ PROX_PARK_TV <dbl> -0.83749312, 2.28192684, 0.66565951, -3.340617…
$ PROX_PRIMARY_SCH_TV <dbl> 1.59230221, 6.70194543, 2.90580089, 12.9836104…
$ PROX_SHOPPING_MALL_TV <dbl> 2.753588422, -0.886626400, -1.056869486, -0.16…
$ PROX_BUS_STOP_TV <dbl> 2.0154464, 4.4941192, 3.0419145, 12.8383775, 0…
$ NO_Of_UNITS_TV <dbl> 0.480589953, -1.380026395, -0.045279967, -0.44…
$ FAMILY_FRIENDLY_TV <dbl> -0.06902748, 2.69655779, 0.04058290, 14.312764…
$ FREEHOLD_TV <dbl> 2.6213469, 3.0452799, 1.1970499, 8.7711485, 1.…
$ Local_R2 <dbl> 0.8846744, 0.8899773, 0.8947007, 0.9073605, 0.…
$ geometry <POINT [m]> POINT (22085.12 29951.54), POINT (25656.…
$ geometry.1 <POINT [m]> POINT (22085.12 29951.54), POINT (25656.…
Summary() function is used in the code chunk below.
summary(gwr_adaptive$SDF$yhat) Min. 1st Qu. Median Mean 3rd Qu. Max.
171347 1102001 1385528 1751842 1982307 13887901
9.2.3 Visualising local R2
The code chunk below is used to create an interactive point symbol map.
tmap_mode("view")tmap mode set to interactive viewing
tmap_options(check.and.fix = TRUE)
tm_shape(mpsz)+
tm_polygons(alpha = 0.1) +
tm_shape(gwr_sf_adaptive) +
tm_dots(col = "Local_R2",
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(11,14))Warning: The shape mpsz is invalid (after reprojection). See sf::st_is_valid
Switching the mode back to plot
tmap_mode("plot")tmap mode set to plotting
9.2.4 Visualising Coefficient Estimates
The code chunk below is used to create an interactive point symbol map from the coefficient estimates
tmap_options(check.and.fix = TRUE)
tmap_mode("view")tmap mode set to interactive viewing
AREA_SQM_SE <- tm_shape(mpsz)+
tm_polygons(alpha = 0.1) +
tm_shape(gwr_sf_adaptive) +
tm_dots(col = "AREA_SQM_SE",
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(11,14))
AREA_SQM_TV <- tm_shape(mpsz)+
tm_polygons(alpha = 0.1) +
tm_shape(gwr_sf_adaptive) +
tm_dots(col = "AREA_SQM_TV",
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(11,14))
tmap_arrange(AREA_SQM_SE, AREA_SQM_TV,
asp=1, ncol=2,
sync = TRUE)Warning: The shape mpsz is invalid (after reprojection). See sf::st_is_valid
Warning: The shape mpsz is invalid (after reprojection). See sf::st_is_valid
Switching the mode back to plot
tmap_mode("plot")tmap mode set to plotting
9.2.5 Visualising by URA Plannign Region
tm_shape(mpsz[mpsz$REGION_N=="CENTRAL REGION", ])+
tm_polygons()+
tm_shape(gwr_sf_adaptive) +
tm_bubbles(col = "Local_R2",
size = 0.15,
border.col = "gray60",
border.lwd = 1)Warning: The shape mpsz[mpsz$REGION_N == "CENTRAL REGION", ] is invalid. See
sf::st_is_valid

10 Conclusion
For this in class exercise, it primarily uses the sfdep package instead of the spdep package as done in the hands-on exercise.
END